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ABSTRACT 

A dicussion is provided of some statistical measures 
and graphical information that, when used as feedback to the student, 
facilitates his ability to assess his own uncertainty. These measures 
and graphs, which result from the application of least squares 
analysis and information theory to des. isicn- theoretic testing, 
provide the student with the capability to compare perceived 
information with actual information. The possibility of improving bis 
ability to communicate uncertainty using the language of probability 
is discussed. (Author/RC) 
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THE STUDENT AS M ASSESSOR OF UNCERTAINTY 
SOME STATISTICAL MEASURES USEFUL FOR FEEDBACK TO THE STUDENT 



For the past year and a half, I have been using computers to 
administer decision- theoretic tests. By using a graphics terminal 
and by exploiting the nearly instantaneous analytical capabilities of 
the computer, a student can be provided with an environinent for under- 
standing the nature of decisiori- theoretic testing and, possibly, for 
improving his ability to communicate uncertainty using the language 
of probabi li ly . 

Figure 1 shows a three-alternative multiple-choice test item 
as it appeared on the screen of che graphics terminal. The subject 
responds by touching a light pen anywhere within or on the edges of 
the triangle. For any response as indicated by the ^'X", the computer 
displays the possible item scores based on a truncated logarithmic 
scoring system (Shuford, Albert & Massengill; 1966). 

Eich point on the triangle corresponds to a probability distribution 
over tie three answers as illustrated by Figure 2. The subject can 
change his response any nvimber of times and when he is satisfied with 
the set of possible scores he can see the correct answer to the question 
as shown in Figure 3. Before moving on to the next question, the 
subject sees a cum[ulative graph of his test score up to now. 

Upon completing a test of from 15 to 20 items, the subject sees 
an anal;/sis of his tesc performance as illustrated by Figure 4. Much of 
cnis analysis is based upon an -valuation of the external predictive 
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validity of the subject's responses. A subject is, in effect, making 
probabilistic predictions as co which answer will be judged correct. 
If a subject had responded to a very large number of test items, we 
could do an analysis such as that shown in Figure 5. For this subject, 
the differences between the observed relative frequency and the ideal 
proportion indicated by the dashed line can be attributed to sampling 
fluctuations so we can conclude that he is unbiased in his use of 
probabilities. [Strictly speaking, the probabilities should be 
treated as triplets as in Shuford & Brown (1974) . ] 

By assuming that the relation between relative frequency and 
probability as used by a subject can be approximated by a linear function 
and by using a least-squares estimation procedure (Brown & Slvaf. r^i; 1973: 
Sibley; 1974: Shuford & Brown; 1974) it is possible to make i; ' 
about a subject's bias from much less data than that used ^• 
Figure 6 illustrates Dvo linear fits one for a subje<.L 
undervalues his information, the other for a subject (11) win L ' - -llL 
his information. These functions are used to eliminate the bias from a 
subject's responses by deriving a new set of revised probabilities, e.g., 
whenever subject II stated that the probability of an answer being 
correct was one, th^ revised probability would be changed to match the 
relative frequency oi .85, 

These revised probabilities are used to compute a new test score 
which, if the subject is biased, will be larger than his original test 
score. The difference between these two scores is the basis for the 
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statement ''YOU CAN IMPROVE YOUR SCORE BY 37 POINTS BY MORE 
REALISTIC USE OF YOUR KNOWLEDGE." shown in Figure 4. Tae 
difference between this new test score and a perfect score is the 
basis for the statement "YOU CA.\ IMPROVE YOU SCORE BY 224 POINTS 
BY MORE STUDY." 

Tnese revised probabilities are use also to estimate the actua l 
ainount oi inf ormatior. (Shannon & Weaver; 1949) the subject possesses 
with respect to the test. This absolute measure is rescaled and 
displayed as "ACT1L\L KNOWLEDGE" as shown in Figure 4. "PERCEIVED 
KNOWLEDGE" is, of course, computed using the original probabilities 
gi\'en by the subject. The subject in Figure 4 undervalues his 
knowledge because his test perf orrriance indicates he actually possesses 
more information than he thinks he does. 

This analysis in terms of actual vs. perceived information, ar^ 
-rn n Fi::'ure 7, is closely related to the old Arabian proverb ~- 

lU^ who knows, and knows that he knov/s , 

Ke is wi.:>e, follow him. 
lie who knows, and knows not that he knows, 

He is asleep, awaken him. 
'^L^ who knows noc, and knows not that he knows not, 

He is a fool, shun him. 
tit: who knows not, and knows thac he knows not, 

He is a child, teach him . 

..c ••'^1^ :.cL\..:LIj/ happen when people are allowed to express tiieii- 
knr vi.er-^e in terms of probabilities? Hopefully, wc will find wise 
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men and children, possibly a few sleepers, but certainly no f o-^ls , 
I wish I could give a definitive answer to this question. All I have 
are sorre tentative but suggestive results reinforced, fortunately, by 
some of the findings that Dave McMullen will report later ili this 
symposiuin. 

At Rand we h ive demonstrated, and tried out, computer-adminis tered 
decisioT^.- theoretic testing to many different people using as sample 
test.N Re.-ider's Digest vocabulary tests; Humanities, Natural Sciences, 
and Social Sciences iceras from a workbook for the College Level 
Examii-ation Prograii; tests; and a mid- term post-graduate level test in 
Econometrics, About half way through these demonstrations we decided 
to begin keeping a permanent record of what people were doing at the 
ter'T^inal , 

Figure 3 compares the two information measures for the first test 
takeri by trach of 66 individuc^ls . lAost of the data points fall below the 

.u.- • -i.ai , indicating thar most of the "subjects'' at least initially 
v^\\jrv.iUK: Li'L^ii kni^wled^e of these subject matter areas. A few people 
^'all r CO tac diagonal, sv.ggesting that there may exist some people 

A'la^ diojrixmoiLie what rhey know well from what they know less well 

w:;:;: a h]L.^..-i:: degree o:.' acciirciCy, 

.vaat happens when people cak^^ i.:ore tests and, thus, gain more 
"•r ._cac.-.' vitn d:::cisirn- thecretic testing? We find that many of 
■ .-c : an r.;:a.ice LiivJir o^-orc; loss due to lack of realism 

(SM.)iey5 lv74) . I tnink that this improvement comes as they begin 

ERLC 



- D - 

to experience the consequence^ ^.f the admissible scoring system 

(Shuford, Albert &c Massengill; 19b6) and learn to reduce their 

risk-taking tendencies by making their utilities more nearly linear 

in points earned or lost. There does, iiowever, appear to be a limit 

to this improvement: . 

A number of people were encouraged or challenged to take more 

tests and to try to be as realistic and to score as well as they 

possibly could. It should be remembered that there is no conflict 

between these goals when an admissible scoring system is used (Shuford 

Brown; 1974) . So I now have 11 subjects who have taken an 

appreciable number of tests enough so I could discard the early ones 

taken while they were learning the procedures and the consequences of the 

admissible scoring system. 

Figure 9 shows the apparently stable state behavior of the most 

biased of the 11 subjects. T^.e line designated is located at the 

mt-r/n of the actual information measures while the line designated Ip is 

located at the mean of tihe perceived information measures. The iniiirsec tio- 

of the two lines gives a gross indication of actual vs. perceived 

information for chose tests the subject decided to attempt. By taking 

the ratio of to I. we can obtaiii a roueh measure of the extent and 
PA ^ 

di.:cccion of bias. The ratio for this subject is 2.44 indicating that 
she tr.ougut that she had almost two and one-half times as much information 
as she actually had. 
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Figure 10 tables some personal characteristics for the 11 svibjects 
listed in decreasing order of bias which goes down almost to the 
unbiased value of 1.00. Notice that no subject yielded an overall 
ratio less than one which would indicate a person who typically 
undervalued his information. Figure 11 compares the information mo^sur 
for subject B. Although apparently striving to reduce bias and to 
improve his score, this subject was also unable to do so. Figures 12 
througli 13 show subjects with decreasing amounts of bias who were more 
and more often successful in producing a realistic assessment of their 
uncertainty. Figures 19 and 20 are for the two most accurate subjects 
who were remarkedly consistent in demonstrating their ability to 
accurately assess their uncertainties - 

In conclusion, the introduction of decision- theoretic testing 
makes it possible to define and to measure for the first time a human 
abilitv, call it r'ealism , which may prove to be a very important 
ao.t..'.rriLnaMt of individual and team performance. For example, to v;]iat 
extent and in what manner is an unrealistic student handicapped i\\ l-i:; 
attc.npts to learn and to study effectively? For another example, does 
a ceaia of realiscic individuals tend to outperform a team of overvalui.n 
individuals and, if so, for what: types of tasks? Answers to these and 
m-iny other questions must await farther research. 

I \icive shown here chat some people can be very realistic over a 
uidc ra^VjC oi subject macLer while otaers characteristically overvalue 
taeir ini ormat.^on . We do not yet know what deficits in this ability 
"ixist within different subgroups of the population nor do we know to 



what extent or what it Lakes to educate people to become more 
realistic. Tlie results surrunarized in Figure 10 certainly prove 
that level of education does not insure realism in assessing diui 
communicating uncertainty . 

The decision theorist, L. J. Savage, in his posthumously 
published article on the ''Elicitation of Personal Probabilities 
and Expectations'^ (Savage; 1971) correctly conjectured that people 
would be found who tended to overvalue their information. I suspect 
that the remainder of his statement will also prove to be prophetic 
and a useful guide for future research and applications of decision- 
theoretic testing. For this reason, I repeat it here. 

"Though recuiring more student time per item, these [decision- 
theoretic testing] methods should result in more discrimination 
per item than ordinary multiple-choice tests, with a possible net 
gain. Also, they seem to open a wealth of opportunities for the 
educational experimenter . 

Mjove ally the educational advantage of training people -~ 
possibly beginning in early childhood to assay the strengths 

of their ovm opinions and to meet risk with judgment seems 
inestinable. The usual tests and language habits of our culture 
Lend to promote confusion between certainty and belief. They 
encourage both the vice of acting and speaking as though we were 
certain when we are oiily fairly sure and that of acting and 
speaking as though the opinions we do have were worthless when 
they are not very strong." 
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